Sampling Distribution

“One of the key concerns of statistics is the drawing of conclusions from a set of observed data. These data will usually consist of a sample of certain elements of a population, and the objective will be to use the sample to draw conclusions about the entire population.”

In this lecture, we will learn about how to construct distributions of sample statistics (e.g., minimum, maximum, mean, median, proportion, standard deviation).

  1. Determine which parameter we would like to know from the population.
  2. Draw a random sample of size \(n\).
  3. Calculate the sample statistic.
  4. Repeat steps 2 and 3 a large number of times.
  5. Display all the sample statistic we obtained in step 4 on the same graph.


Example 1 Suppose in a certain population, the amount of cash people have in their pockets is uniformly distributed between 0 and 100. Use the app to build the sampling distribution of the sample minimum.



Example 2 Suppose in a certain population, the amount of cash people have in their pockets is normally distributed between 0 and 100. Use the app to build the sampling distribution of the sample mean.



Example 3 Pick another population distribution. Use the app to build the sampling distribution of the sample mean.



Central Limit Theorem

Central Limit Theorem:

For a random sample of size \(n\) from a population with mean \(\mu\) and standard deviation \(\sigma\), the sampling distribution of the sample mean, \(\overline{X}\), is approximately normal and has a mean of \(\mu\) and a standard deviation of \(\dfrac{\sigma}{\sqrt n}\).

\[E(\overline{X})=\mu, \mbox{ and }SD(\overline{X})=\dfrac{\sigma}{\sqrt n}.\]


“… practically speaking, no matter how nonnormal the underlying population distribution is, the sample mean of a sample size of at least 30 will be approximately normal.”

An Informal Proof of CLT.

Suppose \(X\) and \(Y\) are independent normal random variables. The additive properties are

\[E[X + Y] = E[X] + E[Y],\] \[Var[X + Y] = Var[X] + Var[Y].\]

The constant multiple properties are \[E[\color{red}cX] = \color{red}cE[X],\] \[Var[\color{red}cX] = \color{red}{c^2} Var[X].\]


If the independent random variables \(X_1, X_2, \ldots, X_n\) are from the same population, whose mean is \(\mu\) and standard deviation is \(\sigma\), then

  • \(E[X_1]=E[X_2]=\cdots=E[X_n]=\mu\).
  • \(Var[X_1]=Var[X_2]=\cdots=Var[X_n]=\sigma^2\).

Therefore, \[E[\overline{X}] = E\left[\dfrac{X_1 + X_2 + \cdots + X_n}{n}\right] = \dfrac{E[X_1]+E[X_2] + \cdots + E[X_n]}{n}=\dfrac{n\mu}{n}=\mu.\]

\[Var[\overline{X}] = Var\left[\dfrac{X_1 + X_2 + \cdots + X_n}{n}\right] = \dfrac{Var[X_1]+Var[X_2] + \cdots + Var[X_n]}{n^2}=\dfrac{n\sigma^2}{n^2}=\dfrac{\sigma^2}{n}.\]

\[SD(\overline{X})=\sqrt{Var(\overline{X})}=\dfrac{\sigma}{\sqrt{n}}.\]

Example 4 Men’s weight is normally distributed with \(\mu = 172\) lb and \(\sigma = 29\) lb.

  1. If 1 man is randomly selected, find the probability that his weight is less than 167 lb.

  2. If 36 men are randomly selected, find the probability that their average weight is less than 167 lb.

  3. If 1 man is randomly selected, find the probability that his weight is between 170 and 175 lb.

  4. If 64 men are randomly selected, find the probability that their mean weight is between 170 and 175.

  5. You are to design an elevator to safely hold 16 people. Find the maximum allowable weight if we want a 0.95 probability that this maximum will not be exceeded in the worst case when 16 randomly selected males are on it.




Book Example 7.2 An insurance company has 10,000 automobile policyholders. If the expected yearly claim per policyholder is $260 with a standard deviation of $800, approximate the probability that the total yearly claim exceeds $2.8 million.




Book Example 7.3 The blood cholesterol levels of a population of workers have mean 202 and standard deviation 14. If a sample of 36 workers is selected, approximate the probability that the sample mean of their blood cholesterol levels will lie between 198 and 206.




Book Example 7.4 An astronomer is interested in measuring, in units of light-years, the distance from her observatory to a distant star. However, the astronomer knows that due to differing atmospheric conditions and normal errors, each time a measurement is made, it will yield not the exact distance, but an estimate of it. As a result, she is planning on making a series of 10 measurements and using the average of these measurements as her estimated value for the actual distance. If the values of the measurements constitute a sample from a population having mean d (the actual distance) and a standard deviation of 3 light-years, approximate the probability that the astronomer’s estimated value of the distance will be within 0.5 light-years of the actual distance.




Sampling Distribution of Sample Proportion

Example 5 Construct a sampling distribution of sample proportion.

The data file ATL_Departure_Flights_2017.csv has the flights status information (on-time or delayed) of all the domestic departure flights in Atlanta Hartsfield-Jackson Airport 2017.

departure <- read.csv("https://albums.yuanting.lu/sta126/ATL_Departure_Flights_2017.csv")
  1. How large is the dataset?
  2. What percentage of departure flights were on-time in 2017?
  3. Take a random sample of 50 flights. What percentage of departure flights were on-time in the sample?
x <- departure$Status[sample(364655, 50)]
table(x) / 50
  1. Repeatedly take 30 or more samples and create a distribution graph.

    phats <- c(p1, p2, p3, p4, ..., p30)
    stripchart(phats, method = 'stack',
            at = 0.15, offset = 0.5, xlim = c(0, 1))
  2. To draw 500 samples and create a distribution graph.

    n <- 50
    pile <- rep(0, 500)
    for (i in 1:500) {
      x <- departure$Status[sample(364655, n)]
      phat <- table(x) / n
      pile[i] <- as.numeric(phat[2])
    }
    stripchart(pile, method = 'stack', pch = 19,
            at = 0.15, offset = 0.1, xlim = c(0, 1), 
            main = "Sampling Distribution of Sample Proportion (n=50)", 
            xlab = "Proportion of on-time departure flights")

Example 6 Repeat the process in the previous example to build a sampling distribution of the sample proportion. This time, use 200 as sample size. Compare the sampling distribution graph with the one in the previous question. Which one has a wider spread?




Book Example 7.7 Suppose that exactly 46 percent of the population favors a particular candidate. If a random sample of size 200 is chosen, what is the probability that at least 100 favor this candidate?




A Video Tutorial